Confirmation detection in human-agent interaction using non-lexical speech cues
نویسندگان
چکیده
Even if only the acoustic channel is considered, human communication is highly multi-modal. Non-lexical cues provide a variety of information such as emotion or agreement. The ability to process such cues is highly relevant for spoken dialog systems, especially in assistance systems. In this paper, we focus on the recognition of non-lexical confirmations such as ”mhm”, as they enhance the system’s ability to accurately interpret human intent in natural communication. We implemented and evaluated a system for online detection of nonlexical confirmations. The architecture uses a Support Vector Machine to detect confirmations based on acoustic features. In a systematic comparison, several feature sets were evaluated for their performance on a corpus of human-agent interaction in a setting with naive users including elderly and cognitively impaired people. Our results show that using stacked formants as features yield an accuracy of 84% outperforming regular formants and MFCC or pitch based features for online classification.
منابع مشابه
Extracting the acoustic features of interruption points using non-lexical prosodic analysis
Non-lexical prosodic analysis is our term for the process of extracting prosodic structure from a speech waveform without reference to the lexical contents of the speech. It has been shown that human subjects are able to perceive prosodic structure within speech without lexical cues. There is some evidence that this extends to the perception of disfluency, for example, the detection interruptio...
متن کاملHead gestures for perceptual interfaces: The role of context in improving recognition
Head pose and gesture offer several conversational grounding cues and are used extensively in face-to-face interaction among people. To accurately recognize visual feedback, humans often use contextual knowledge from previous and current events to anticipate when feedback is most likely to occur. In this paper we describe how contextual information can be used to predict visual feedback and imp...
متن کاملExperiments in context-independent recognition of non-lexical 'yes' or 'no' responses
We present our experiments in context-free recognition of non-lexical responses. Non-lexical verbal responses such as mmm-hmm or uh-huh are used by listeners to signal confirmation, uncertainty in understanding, agreement or disagreement in speech-based interaction between humans. Correct recognition of these utterances by speech interfaces can lead to a more natural interaction paradigm with c...
متن کاملReal-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs
Setting of data collection Phone account information Automated dialog system Medical emergencies Human-Human interactions Motivation Improve customer satisfaction Study real-life speech in highly emotive situations Studied emotions Negative, non-negative (but 7 emotions annotated) Anger, Fear, Relief, Sadness (but finer-grained annotation) Corpus used for experiments 5690 dialogs 20,013 user tu...
متن کاملReal-life emotions detectio paralinguistic cues on Human-H
The emotion detection work reported here is part of a larger study aiming to model user behavior in real interactions. We already studied emotions in a real-life corpus with human-human dialogs on a financial task. We now make use of another corpus of real agent-caller spoken dialogs from a medical emergency call center in which emotion manifestations are much more complex, and extreme emotions...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.00171 شماره
صفحات -
تاریخ انتشار 2017